Social Events
They Fell in Love Playing 'Minecraft.' Then the Game Became Their Wedding Venue
On a crisp Saturday in March, beneath a canopy of pixelated cherry blossoms, two avatars stood in front of a digital altar crafted from shimmering quartz blocks and flickering redstone torches. They were surrounded by a sprawling Minecraft village, complete with custom-coded NPCs reciting lore about the couple's decade-long digital courtship. Nearby, pixelated foxes darted between guests--each one logged in from across the world, dressed in custom skins as forest druids and rogue mages. After the vows (typed and read aloud on Discord), guests dispersed for side quests, scavenger hunts, and an enchanted maze culminating in a virtual fireworks show. This wasn't a rehearsal for an in-person wedding--this was the wedding.
Revisiting the Predictability of Performative, Social Events
Social predictions do not passively describe the future; they actively shape it. They inform actions and change individual expectations in ways that influence the likelihood of the predicted outcome. Given these dynamics, to what extent can social events be predicted? This question was discussed throughout the 20th century by authors like Merton, Morgenstern, Simon, and others who considered it a central issue in social science methodology. In this work, we provide a modern answer to this old problem. Using recent ideas from performative prediction and outcome indistinguishability, we establish that one can always efficiently predict social events accurately, regardless of how predictions influence data. While achievable, we also show that these predictions are often undesirable, highlighting the limitations of previous desiderata. We end with a discussion of various avenues forward.
Dedicated Feedback and Edit Models Empower Inference-Time Scaling for Open-Ended General-Domain Tasks
Wang, Zhilin, Zeng, Jiaqi, Delalleau, Olivier, Egert, Daniel, Evans, Ellie, Shin, Hoo-Chang, Soares, Felipe, Dong, Yi, Kuchaiev, Oleksii
Inference-Time Scaling has been critical to the success of recent models such as OpenAI o1 and DeepSeek R1. However, many techniques used to train models for inference-time scaling require tasks to have answers that can be verified, limiting their application to domains such as math, coding and logical reasoning. We take inspiration from how humans make first attempts, ask for detailed feedback from others and make improvements based on such feedback across a wide spectrum of open-ended endeavors. To this end, we collect data for and train dedicated Feedback and Edit Models that are capable of performing inference-time scaling for open-ended general-domain tasks. In our setup, one model generates an initial response, which are given feedback by a second model, that are then used by a third model to edit the response. We show that performance on Arena Hard, a benchmark strongly predictive of Chatbot Arena Elo can be boosted by scaling the number of initial response drafts, effective feedback and edited responses. When scaled optimally, our setup based on 70B models from the Llama 3 family can reach SoTA performance on Arena Hard at 92.7 as of 5 Mar 2025, surpassing OpenAI o1-preview-2024-09-12 with 90.4 and DeepSeek R1 with 92.3.
GIMMICK -- Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking
Schneider, Florian, Holtermann, Carolin, Biemann, Chris, Lauscher, Anne
Large Vision-Language Models (LVLMs) have recently gained attention due to their distinctive performance and broad applicability. While it has been previously shown that their efficacy in usage scenarios involving non-Western contexts falls short, existing studies are limited in scope, covering just a narrow range of cultures, focusing exclusively on a small number of cultural aspects, or evaluating a limited selection of models on a single task only. Towards globally inclusive LVLM research, we introduce GIMMICK, an extensive multimodal benchmark designed to assess a broad spectrum of cultural knowledge across 144 countries representing six global macro-regions. GIMMICK comprises six tasks built upon three new datasets that span 728 unique cultural events or facets on which we evaluated 20 LVLMs and 11 LLMs, including five proprietary and 26 open-weight models of all sizes. We systematically examine (1) regional cultural biases, (2) the influence of model size, (3) input modalities, and (4) external cues. Our analyses reveal strong biases toward Western cultures across models and tasks and highlight strong correlations between model size and performance, as well as the effectiveness of multimodal input and external geographic cues. We further find that models have more knowledge of tangible than intangible aspects (e.g., food vs. rituals) and that they excel in recognizing broad cultural origins but struggle with a more nuanced understanding.
Multi-Order Hyperbolic Graph Convolution and Aggregated Attention for Social Event Detection
Liu, Yao, Liu, Zhilan, Tan, Tien Ping, Li, Yuxin
Social event detection (SED) is a task focused on identifying specific real-world events and has broad applications across various domains. It is integral to many mobile applications with social features, including major platforms like Twitter, Weibo, and Facebook. By enabling the analysis of social events, SED provides valuable insights for businesses to understand consumer preferences and supports public services in handling emergencies and disaster management. Due to the hierarchical structure of event detection data, traditional approaches in Euclidean space often fall short in capturing the complexity of such relationships. While existing methods in both Euclidean and hyperbolic spaces have shown promising results, they tend to overlook multi-order relationships between events. To address these limitations, this paper introduces a novel framework, Multi-Order Hyperbolic Graph Convolution with Aggregated Attention (MOHGCAA), designed to enhance the performance of SED. Experimental results demonstrate significant improvements under both supervised and unsupervised settings. To further validate the effectiveness and robustness of the proposed framework, we conducted extensive evaluations across multiple datasets, confirming its superiority in tackling common challenges in social event detection.
SocialED: A Python Library for Social Event Detection
Zhang, Kun, Yu, Xiaoyan, Li, Pu, Peng, Hao, Yu, Philip S.
SocialED is a comprehensive, open-source Python library designed to support social event detection (SED) tasks, integrating 19 detection algorithms and 14 diverse datasets. It provides a unified API with detailed documentation, offering researchers and practitioners a complete solution for event detection in social media. The library is designed with modularity in mind, allowing users to easily adapt and extend components for various use cases. SocialED supports a wide range of preprocessing techniques, such as graph construction and tokenization, and includes standardized interfaces for training models and making predictions. By integrating popular deep learning frameworks, SocialED ensures high efficiency and scalability across both CPU and GPU environments. The library is built adhering to high code quality standards, including unit testing, continuous integration, and code coverage, ensuring that SocialED delivers robust, maintainable software.
Towards Effective, Efficient and Unsupervised Social Event Detection in the Hyperbolic Space
Yu, Xiaoyan, Wei, Yifan, Zhou, Shuaishuai, Yang, Zhiwei, Sun, Li, Peng, Hao, Zhu, Liehuang, Yu, Philip S.
The vast, complex, and dynamic nature of social message data has posed challenges to social event detection (SED). Despite considerable effort, these challenges persist, often resulting in inadequately expressive message representations (ineffective) and prolonged learning durations (inefficient). In response to the challenges, this work introduces an unsupervised framework, HyperSED (Hyperbolic SED). Specifically, the proposed framework first models social messages into semantic-based message anchors, and then leverages the structure of the anchor graph and the expressiveness of the hyperbolic space to acquire structure- and geometry-aware anchor representations. Finally, HyperSED builds the partitioning tree of the anchor message graph by incorporating differentiable structural information as the reflection of the detected events. Extensive experiments on public datasets demonstrate HyperSED's competitive performance, along with a substantial improvement in efficiency compared to the current state-of-the-art unsupervised paradigm. Statistically, HyperSED boosts incremental SED by an average of 2%, 2%, and 25% in NMI, AMI, and ARI, respectively; enhancing efficiency by up to 37.41 times and at least 12.10 times, illustrating the advancement of the proposed framework. Our code is publicly available at https://github.com/XiaoyanWork/HyperSED.
Gemini vs. ChatGPT: Which one planned my wedding better?
I was all about the wedding bells after getting engaged in June, but after seeing some of these wedding venue quotes, it's more like alarm bells. "Ding-dong" has been remixed to "cha-ching" – and I need help. I don't even know how to begin wedding planning. What are the first steps? What do I need to prioritize first?
'I find them quite magical': the UK's obsession with weather apps
Several times a day, Francesca Simon, the author of the Horrid Henry children's books, gets out her phone to check the weather – not just for where she is, but where friends and family live, where she has been on holiday, where she was brought up. I find them quite magical," she said. With about 10 locations logged, her friends make fun of her "weather porn" habit. This week, Simon discovered she shared a weather app fixation with Queen Camilla when the pair discussed a miserable summer's day at a charity event. "[Camilla] said everybody teases her … so we were laughing at our mutual obsession," Simon said. It is an obsession shared by millions. If you are going on holiday, planning a summer barbecue, worrying about your garden or suffering from hay fever, you are likely to check an app at least daily for the latest forecast. The apps give much more localised and detailed information than traditional weather forecasts, including wind speeds and the percentage chance of rain, in ...
Language Models Represent Beliefs of Self and Others
Zhu, Wentao, Zhang, Zhining, Wang, Yizhou
Understanding and attributing mental states, known as Theory of Mind (ToM), emerges as a fundamental capability for human social reasoning. While Large Language Models (LLMs) appear to possess certain ToM abilities, the mechanisms underlying these capabilities remain elusive. In this study, we discover that it is possible to linearly decode the belief status from the perspectives of various agents through neural activations of language models, indicating the existence of internal representations of self and others' beliefs. By manipulating these representations, we observe dramatic changes in the models' ToM performance, underscoring their pivotal role in the social reasoning process. Additionally, our findings extend to diverse social reasoning tasks that involve different causal inference patterns, suggesting the potential generalizability of these representations.